Building a Cross-Language Entity Linking Collection in Twenty-One Languages
نویسندگان
چکیده
We describe an efficient way to create a test collection for evaluating the accuracy of cross-language entity linking. Queries are created by semiautomatically identifying person names on the English side of a parallel corpus, using judgments obtained through crowdsourcing to identify the entity corresponding to the name, and projecting the English name onto the non-English document using word alignments. We applied the technique to produce the first publicly available multilingual cross-language entity linking collection. The collection includes approximately 55,000 queries, comprising between 875 and 4,329 queries for each of twenty-one non-English languages.
منابع مشابه
Creating and Curating a Cross-Language Person-Entity Linking Collection
To stimulate research in cross-language entity linking, we present a new test collection for evaluating the accuracy of cross-language entity linking in twenty-one languages. This paper describes an efficient way to create and curate such a collection, judiciously exploiting existing language resources. Queries are created by semi-automatically identifying person names on the English side of a ...
متن کاملCross-Language Entity Linking
There has been substantial recent interest in aligning mentions of named-entities in unstructured texts to knowledge base descriptors, a task commonly called entity linking. This technology is crucial for applications in knowledge discovery and text data mining. This paper presents experiments in the new problem of crosslanguage entity linking, where documents and named entities are in a differ...
متن کاملCross-Language Person-Entity Linking from Twenty Languages
The goal of entity linking is to associate references to some entity that are found in unstructured natural language content to an authoritative inventory of known entities. This paper describes the construction of six test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with two crowdsourced validation stages t...
متن کاملHITS' Cross-lingual Entity Linking System at TAC 2011: One Model for All Languages
This paper presents HITS’ system for crosslingual entity linking at TAC 2011. We approach the task in three stages: (1) context disambiguation to obtain a language-independent representation, (2) entity disambiguation, (3) clustering of the queries that have not been linked in the second step. For each of these steps one single model is trained and applied to both languages, i.e. English and Ch...
متن کاملCross Lingual Entity Linking with Bilingual Topic Model
Cross lingual entity linking means linking an entity mention in a background source document in one language with the corresponding real world entity in a knowledge base written in the other language. The key problem is to measure the similarity score between the context of the entity mention and the document of the candidate entity. This paper presents a general framework for doing cross lingu...
متن کامل